CLAIMS 

What is claimed is: 

1. In a computer system including a processor which contains a 
first set of N-bit data elements ldaded into a first register and a second set of 
N-bit data elements loaded into/a second register, a method for providing 
extended precision in single instruction multiple data (SIMD) arithmetic 
operations, comprising the steps of: 

fetching an arithmetic instruction from a memory unit; 

decoding the arithmetic! instruction^ and reading the first vector register 
and the second vector register 

executing the arithmetic instru^tioiybn corresponding N-bit data 
elements in the first register and second register to produce corresponding 
resulting elements; j 

writing the resulting elements into corresponding elements of an 
accumulator; j 

transforming the eacfx resulting element in the accumulator into N- 
bits; and 

writing the transformed elements of N-bit width into a third register. 



-2. TTm-moiJwt+^ 
further comprises the ster 
selecting an elemefnt^ 



oedled in Claim 1, whergsr gaid d e l u ding step - 



copying^Jhe^electted elemen 



$m 



second register; and 

other elements in the second 
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:he metn5tf~ayredted4n^l aim 1, . w kereirr said ^rithmotie - 
instruction is an addition of corresponding vector elements in j&e first and 
second vector registers. 



5 4. The method as recited in Claim 1, wherein said arithmetic 

instruction is a multiplication of corresponding vector elements in the first 
and second vector registers. 

5. The method as recited in Clai^ft 1, wherein said arithmetic 
10 instruction is a subtraction of second vec^6r register elements from the first 
vector register elements. 



15 



6. The method/as recited m Claim 1, wherein said accumulator is a 
register having an integer multiples pf^4-bit width. 



20 



25 



7. The metho 
register of 192-bits. 



as recited in Claim 1, wherein said accumulator is a 



8. The method as/recited in Claim 1, wherein said transformation 
step further comprises the steps of: 

scaling the resulting elements in the accumulator by shifting the 
values in the resulting dements; 

rounding the scafled resulting elements in the accumulator; and 
clamping the rounded resulting elements. 

9. The method as recited in Claim 1, wherein said third register 
wri ting step -far ther comprises t he Steps of: 
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reading-a-p ortion oilhe accunra fcrtor-efei rrents; affd 
writing the portion of the accumulator element/ into the 
. corresponding elements of said third register. 

5 10. The method as recited in Claim 9, Wherein the portion is either 

the low third bits or the high third bits of the elements in the accumulator. 

11. The method as recited in Claim/l, wherein the values in the 
resulting elements are wrapped around the /epresentable range of the 
10 accumulator elements. 



12. The method as rec 
integers. 



ted in ClAim \ wherein the data elements are 



15 13. The method as recited in/Claim 1, 

second register, and the third regis|e^s are floating 



wherein the first register, the 
point registers. 



20 



14. The method as recited in Claim 1, wherein the first register, the 
second register, and the third register are each 64-bit wide. 

15. The method as recited in Claim 1, wherein N is 8. 

16. The method as recited in Claim 1, wherein N is 16. 



25 17. The method asf recited in Claim 15, wherein the elements in the 

•aeearauiat m ' are eacr r24- W w i d e:- • 
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1 0 , vvhgf gi i r die elemen t s in i&e-^ 



accumulator are each 48 bit wide. 

19. The method as recited in Oiaim 1, wherein said third register 

writing step further comprises the stepfe of: 

reading a portion of the accumulator elements; and 
writing the portion of the accumulator elements into the 

corresponding elements of said th/rd register. 



10 20. The method as recited in Claim 19, wherein the portion is 

chosen from the^o^TmlKlbits^the middle third bits, or the high third bits of 
the elements in the accumuU 



15 



20 



25 



21. In a compute^ system in^skfcling a processor^vhich contains a 
first set of N-b\t data elen/ents loaded mt^ftrst-register, a second set of N-bit 
data elements loaded in/o a second register, and an accumulator having a 



>viding extended precision in 



third set of data elements, a method for pr 

single instruction rWtiple data (SIMD) arithmetic operations, comprising the 
steps of: 

fetching an aVithmeJic instruction' from a memory unit; 
decoding thfe arithmetic instruction and reading the first vector register 

and the second vector register- 
executing/ the arithmetic instruction on corresponding data elements in 

the first and second vector registers to produce corresponding resulting 

elements; 

addirjg the resulting'elements to mTcorresp^ndjiig-elenients in the 
Imulal 
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15 



f i l l ing Lh e le^ ultmg olomcn ts-jmD'the'a^etnrroktefi 



transforming the each resulting element in the accumulator into an N- 
bit width element; and 

writing the transformed elements of N-bit width into a third register. 
5 / 

22. The method as recitefd in Claim 21, wherein said decoding step 
further comprises the steps of: 

selecting anelemeht from the second register; and 

copying ^e selected Y^ment into the other elements in the second 
10 register. 



23. The method aj; 
instruction is Ian addition o: 
second vector regis/ers. 



recited in Claim 21, wherein said arithmetic 
corresponding vector elements in the first and 



24. TheVethod as rlcited in Claim 21, wherein said arithmetic 
instruction is I muWlicationjof corresponding vector elements in the first 
and second vector regft 

20 25. / The method as recited in Claim 21, wherein said arithmetic 

instruction is a subtraction of second vector register elements from the first 
vector register elements. 



26. The method as recited in Claim 21, wherein said accumulator is 
25 a ro&otcr h - avara an inlngm mu4 ti f JA ft -^f ^.frf wid th^ 
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27: The me t h od as recTt gd in Cldiiu 2 3b-4&d ffiein said accumulator -is 

-a^rggister ot T92 5 bks^^_ 



*eJL.S 



10 



28. The method as recited in Claim/>1, wherein said transformation 
step further comprises the steps of: 

scaling the resulting elements in tl/e accumulator by shifting the 
values in the resulting elements; 

rounding the scaled resulting elements in the accumulator; and 

clamping the rounded resulting elements. 



29. 



The method as recited\K Claim 21, wherein said-fnird register 



writing step further comprises trie stepk of: 
reading a portion of the accumulator 
writing the portion of/the accumulate 



15 corresponding elements of said third register. 

30. The methfctcyas recited in Clafln 29, wherein the portion is either 
the low third bits or hig/h thk^bitsj^Hhe elements in the accumulator. 



elements; and 
Dr elements into the 



20 31. The method as recited in Claim 21, wherein the values in the 

resulting elements are wrapped around the representable range of the 
accumulator demerits. 



32. The/ method as recited in Claim 21, whgi^in-the-daia^lem 
<3§- — a rc intog ergT^ 
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33. ^heTTTHthurf-^s-re 

second register, and the third registers are floating point registers. 

34. The method as recited in Claim 21, wherein the first register, the 
second register, and the third register are each 64-bit wide. 

35. The method as recited in Claim 21, wherein N is 8. 

36. The method as-reeited in/Claim 21, wherein N is 16. 



37. The methjbd 
accumulator are each 



as recited /ii\Claim 35, wherein the/elements in the 
bit wide. 



38. The methoti as recited in Cla\m\ 3^wherein the elements in the 
accumulator are each 48 bit widej 



39. The method as\ recited in Claim 21,\wherein said third register 

writing step further comprises^ le steps of: 

reading a portion of the accumulator elements; and 
writing the portion of the acct^iiul^tef elements into the 

corresponding elements of said third register. 



40. The method afe recited in Claim 39, wherein the portion is 
chosen from the low third/bits, the middle third bits, or the high third bits of 
the^ 
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